docs: EP-1270 Authorization (access control) design proposal#2075
docs: EP-1270 Authorization (access control) design proposal#2075davidkarlsen wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new Enhancement Proposal (EP-1270) documenting a design for introducing fine-grained authorization (access control) in KAgent, centered on CEL-based policy evaluation while preserving the existing auth.Authorizer seam for pluggable implementations.
Changes:
- Introduces EP-1270 documenting current authorization gaps and the proposed CEL-based default authorizer.
- Specifies a policy model, decision context, and rollout strategy (opt-in, fail-closed, cached compilation).
- Outlines operational considerations (list filtering, A2A gating) and an initial test plan.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
430afb4 to
c86e289
Compare
Proposes a real Authorizer to replace the open-by-default NoopAuthorizer: CEL-based, in-process, behind the existing auth.Authorizer interface, with per-resource policy on the Agent CR compiled via reconciliation and a default-deny model. Builds on the stalled prototypes in kagent-dev#1766 and kagent-dev#1370. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: David J. M. Karlsen <david@davidkarlsen.com>
Address PR review: ProxyAuthenticator only populates Principal.Claims for direct user calls; the agent-call path (X-Agent-Name) sets User/Agent but not Claims. Qualify the Background statement and strengthen Open Question kagent-dev#5 — a claims-only fail-closed policy would deny internal agent/M2M traffic, so the model needs an agent-identity match or a separate M2M lane. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: David J. M. Karlsen <david@davidkarlsen.com>
c86e289 to
a5366a4
Compare
|
@dimetron PTAL? |
|
The proposal picks the right engine (CEL) and reuses the existing What's good (approve)
What he missed (must address before
|
| Concern | Where | Why |
|---|---|---|
| Coarse gate, "may you touch this resource type and verb at all" | Middleware (deny-by-default) | One chokepoint; closes the gap structurally |
| List filtering, per returned item | Handler | Needs the response set, not just the request |
| Create where name and namespace are in the body | Handler | Middleware sees path vars, not the decoded body |
Per-resource policy combining (Agent spec.accessPolicy) |
Handler | Needs the fetched resource plus central-vs-resource combining |
Non-uniform routes (A2A /{ns}/{name}, sessions and memory keyed by agent) |
Both, with an explicit route entry | Resource identity is not inferable from the path shape alone |
This needs two pieces: a declarative registry that maps each route to its
resource type, verb, and whether it is public, and an explicit public allowlist
(/health, /version, and the self-scoped /api/user) so probes and
self-calls keep working.
Why it is worth the refactor: a missing registry entry fails closed (the request
is denied), whereas a missing Check() today fails open (the request goes
through). That asymmetry is the whole reason to do it. The same middleware also
covers the A2A PathPrefix handler (server.go:347) instead of leaving it to a
separate hand-wired gate. The EP should state this enforcement-model choice
explicitly rather than inheriting the implicit per-handler one.
Smaller suggestions
- Keep
authorizerset tonoopwheneverauth.mode=unsecure. Dev clusters
have noclaims, so a claims-based policy would lock them out. - Add
Namespacetoauth.Resource, or normalize it once inside the
authorizer, so handlers stop re-parsingnamespace/name. - Add an observability section: decision metrics
(kagent_authz_decisions_total{result,resource_type},
kagent_authz_config_valid) and deny logging atV(1)that never prints
claim values. - Add a threat-model and trust-boundary paragraph. The proxy validates the JWT
and the controller trusts the proxy. List what authz does not cover (direct
pod access in Secure the kagent UI #2028, and the ungated endpoints). - Add break-glass and bootstrap-admin guidance so turning authz on against a
live cluster does not lock everyone out. - Note that the Casbin sections in EP-476 are superseded by EP-1270.
- Note the size limits: a CEL string in
spec.accessPolicycounts against the
etcd object limit, the central ConfigMap is capped at 1 MiB, and the
compiled-program cache currently only evicts on delete.
Authorization gates matrix
Verified against main by counting Check() and authorizeAgentRequest calls
per handler file in go/core/internal/httpserver/handlers.
| Handler | Routes (examples) | Gates today | Sensitivity | Risk if CEL enabled |
|---|---|---|---|---|
agents.go |
/api/agents/* |
yes, ~12 (incl. authorizeAgentRequest) |
high | covered |
modelconfig.go |
/api/modelconfigs/* |
yes, 5 | high (cred refs) | covered |
prompttemplates.go |
/api/prompttemplates/* |
yes, 5 | medium | covered |
toolservers.go |
/api/toolservers/* |
yes, 4 | medium | covered |
toolservertypes.go |
/api/toolservertypes/* |
yes, 1 | low | covered |
mcpapps.go |
/api/mcpapps/* |
yes, 1 | low | covered |
substrate.go |
/api/substrate/* |
yes, 1 | medium | covered |
sessions.go |
/api/sessions/* |
none | high (conversation content) | bypass |
memory.go |
/api/memories/* |
none | high (embeddings, PII) | bypass |
tasks.go |
/api/tasks/* |
none | high (task data) | bypass |
checkpoints.go |
LangGraph checkpoints | none | high (state) | bypass |
modelproviderconfig.go |
/api/modelproviderconfigs/* |
none | high (credential-adjacent) | bypass |
models.go |
/api/models |
none | medium | bypass |
namespaces.go |
/api/namespaces |
none | medium (enumeration) | bypass |
tools.go |
/api/tools |
none | low to medium | bypass |
feedback.go |
/api/feedback/* |
none | low | bypass |
crewai.go |
CrewAI routes | none | medium | bypass |
agentharness_gateway.go |
/api/agentharnesses/... |
none | medium | bypass |
agentharness_session.go |
harness sessions | none | medium | bypass |
companion_secrets.go |
companion secrets | none | high (secrets) | bypass |
current_user.go |
/api/user |
none | low (self) | acceptable |
health.go |
/health, /version |
none (by design) | none | acceptable |
| A2A invoke | /api/a2a/{ns}/{name} |
authn only (A2AAuthenticator), no Authorizer |
high (direct agent run) | bypass |
Summary: about 8 of the ~22 handler areas gate authz today, all of them CRUD on
Agent, ModelConfig, ToolServer, and PromptTemplate. The other half includes the
most sensitive surfaces: sessions, memory, companion secrets, model-provider
config, and A2A invoke. Turning on CELAuthorizer without the coverage work
leaves those as bypass paths, which is why the "wired into every handler" line
has to become a real coverage matrix and a scope commitment.
Bottom line
The direction is right: CEL as the default behind the existing interface. Merge
it as provisional. Before it moves to accepted:
- Replace the coverage claim with the matrix above, commit to closing the
sensitive gaps, and adopt a deny-by-default authz middleware (hybrid with
per-handlerCheck()) so the gap cannot come back. - Move M2M principals and policy combining out of "Open Questions" and into
resolved design. For M2M, adopt verified workload identity (projected SA
token or SPIFFE) bound into the principal, not theX-Agent-Nameheader. - Add the observability and threat-model sections.
References
Related PRs and issues:
- #1766: per-agent
kagent.dev/allowed-groupsannotation,GroupAuthorizer, agent-list
filtering, and A2A request gating (stalled on inactivity). - #1370: pluggable external
authorizer (OPA-style webhook) behind theAuthorizerinterface (stalled). - #1293: OIDC proxy
authentication (EP-476), which added thetrusted-proxymode this EP builds on. - #2071: STS and delegated
identity, relevant to propagating caller claims on the agent path. - #2028: network gating
(HTTPRoute, NetworkPolicy, OpenShift Route), the out-of-scope edge controls.
Source locations (verified against main):
NoopAuthorizer.Checkreturnsnil:go/core/internal/httpserver/auth/authz.go.Authorizerinterface, theVerbset (get/create/update/delete),
Resource{Name, Type}with noNamespacefield, andPrincipal.Claims:
go/core/pkg/auth/auth.go(Verbat lines 9-16,Resourceat 18-21,
Principalat 32-36).- Central
Checkhelper and the HTTP-method-to-verb switch:
go/core/internal/httpserver/handlers/helpers.go:56-80. - Middleware chain, where an
AuthzMiddlewarewould sit next to
AuthnMiddleware:go/core/internal/httpserver/server.go:356-360. - A2A registered as a
PathPrefixhandler with authentication only, no
authorizer:go/core/internal/httpserver/server.go:347. cel-goalready in the module graph (indirect today, promote to direct when
implementing):go/go.mod.- Agent status conditions, where
AccessPolicyValidwould be reported:
go/api/v1alpha2/agent_types.go.
Summary
Adds an Enhancement Proposal for authorization (access control) in KAgent — issue #1270.
Today the controller ships with
NoopAuthorizer, so once a user is authenticated they can list, invoke, edit and delete every Agent, ModelConfig and ToolServer across every namespace. Enabling OIDC (#1293) gives authentication but no access control. This EP proposes the fine-grained authorization that EP-476 explicitly deferred.Approach
The earlier #1270 discussion stalled on a design tension: an opinionated in-process RBAC engine vs. a pluggable extension point. The EP proposes CEL as the resolution — it's both:
cel-gois already in our module graph), andclaims/verb/resource, so groups are one option among many and the project isn't married to one engine.The
auth.Authorizerinterface stays the seam, so an external/OPA authorizer (#1370) remains pluggable. Per-resource policy lives on the Agent CR, compiled via reconciliation (cached, validated ontostatus.conditions), enforced centrally. Builds on the stalled prototypes in #1766 (per-agent annotation + list filtering + A2A gating) and #1370 (external authorizer interface) rather than starting over.Design comment that led here: #1270 (comment)
Status
provisional— following the "merge early and iterate" guidance in the EP template. High-level direction is the goal; details (per-resource carrier, policy-combining semantics, default-deny behavior) are flagged as Open Questions /UNRESOLVEDfor discussion.Looking for a maintainer sponsor and a directional 👍 on "CEL as the default, behind the existing interface."
/cc @EItanya @peterj
🤖 Generated with Claude Code